Hands-on Exercise 3B

Published

January 15, 2024

Modified

January 25, 2024

4 Programming Animated Statistical Graphics with R

4.1 Overview and Learning Outcomes

This hands-on exercise is based on Chapter 4 of the R for Visual Analytics book.

The learning outcomes are:

  • Learn how to create animated data visualisation using the gganimate and plotly packages; and

  • Learn how to reshape data using the tidyr package, and process, wrangle, and transform data using the dplyr package.

4.1.1 Basic Concepts of Animation

When creating animations, the plot does not actually move. Instead, many individual plots are built and stitched together as movie frames, just like an old-school flip book or cartoon. Each frame is a different plot when conveying motion, which is built using some relevant subset of the aggregate data (e.g., subsets based on time variable). The subset drives the flow of the animation when stitched back together.

4.1.2 Terminology

The key concepts and terminology related to this type of animated data visualisation are:

  1. Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data.

  2. Animation Attributes: The animation attributes are the settings that control how the animation behaves. For example, one can specify the duration of each frame, the easing function used to transition between frames, and whether to start the animation from the current frame or from the beginning.

4.2 Getting Started

3.2.1 Installing and Loading Required Libraries

In this hands-on exercise, the following R packages are used:

  • tidyverse (i.e. readr, tidyr, dplyr) for performing data science tasks such as importing, tidying, and wrangling data;

  • plotly for plotting interactive statistical graphs;

  • gganimate (ggplot extension) for creating animated statistical graphs.

  • gifski for converting video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.

  • gapminder fpr providing data available at Gapminder.org. The country_colors scheme is relevant.

The code chunk below uses the p_load() function in the pacman package to check if the packages are installed. If yes, they are then loaded into the R environment. If no, they are installed, then loaded into the R environment.

pacman::p_load(tidyverse, readxl,
               plotly, gganimate, 
               gifski, gapminder)

4.2.2 Importing Data

The dataset for this hands-on exercise is imported into the R environment using the read_xls() function in the readxl package and stored as the R object, globalPop.

The mutate_each_() function in the dplyr package is used to convert all character data type into factor data type, and the mutate() function in the dplyr package is used to convert the data values for the Year field into integer data type.

col = c("Country", "Continent")

globalPop = read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_each_(funs(factor(.)), col) %>%
  mutate(Year = as.integer(Year))

However, the mutate_each_() function was deprecated in dplyr 0.7.0. and the funs() function was deprecated in dplyr 0.8.0. In view of this, the code chunk is rewritten using the mutate_at() function in the dplyr package.

col = c("Country", "Continent")

globalPop = read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_at(col, as.factor) %>%
  mutate(Year = as.integer(Year))

Alternatively, the across() function in the dplyr package can be used with the mutate() function to obtain the same output.

col = c("Country", "Continent")

globalPop = read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate(across(col, as.factor)) %>%
  mutate(Year = as.integer(Year))

4.2.3 Exploring Data

The tibble data frame, globalPop, has 6 columns and 6,204 rows.

  • It consists of the populations of 222 countries, across 6 continents.

  • It also shows the percentages of the Old and Young subsets of the populations.

n_distinct(globalPop$Country)
[1] 222
n_distinct(globalPop$Continent)
[1] 6

4.3 Animated Data Visualisation: gganimate Methods

The gganimate package extends the Grammar of Graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object to customise how it should change with time:

  • transition_*() defines how the data should be spread out and how it relates to itself across time.

  • view_*() defines how the positional scales should change along the animation.

  • shadow_*() defines how data from other points in time should be presented in the given point in time.

  • enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.

  • ease_aes() defines how different aesthetics should be eased during transitions.

4.3.1 Building A Static Bubble Plot

A static population bubble plot is created using basic ggplot2 functions.

ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') 

4.3.2 Building An Animated Bubble Plot

To create an animated population bubble plot, the transition_time() function in the gganimate package is used to create transition through distinct states in time (i.e. Year), and the ease_aes() function in the gganimate package is used to control the easing of aesthetics. The default is ‘linear’. Other easing methods are quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') +
  transition_time(Year) +       
  ease_aes('linear')

4.4 Animated Data Visualisation: plotly Methods

Both the plot_ly() and ggplotly() functions in the plotly package can be used to support key frame animations through the “frame” aesthetic argument. They also support the “ids” aesthetic argument to ensure smooth transitions between objects with the same id (which helps facilitate object constancy).

4.4.1 Building An Animated Bubble Plot Using ggplotly() Method

An animated population bubble plot is created using the ggplotly() function to convert the static graphic object into an animated svg object.

gg = ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young')

ggplotly(gg)

Although the “show.legend” argument was set as FALSE, the legend still appeared on the plot. To overcome this problem, the theme() function with the “legend.position” argument set to none is used.

gg = ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none')

ggplotly(gg)

4.4.2 Building An Animated Bubble Plot Using plot_ly() Method

An animated population bubble plot is created using the plot_ly() function.

bp = globalPop %>%
  plot_ly(x = ~Old, 
          y = ~Young, 
          size = ~Population, 
          color = ~Continent,
          sizes = c(2, 100),
          frame = ~Year, 
          text = ~Country, 
          hoverinfo = "text",
          type = 'scatter',
          mode = 'markers') %>%
  layout(showlegend = FALSE)

bp

4.5 References

~~~ End of Hands-on Exercise 3B ~~~